Title: Implementing the Pairs Trading Strategy on Crypto Markets¶

Eshan Kaul

Background¶

An emerging asset class, the recent surge of popularity in crypto markets has made cryptocurrencies an essential part of investment portfolios for retail and institutional investors. As prices for cryptocurrencies continue to break previous highs, the race is on for investors to develop a trading strategy that can take advantage of the high volatility and fluctuation that exists in the crypto markets. The pairs trading strategy is an example of a market-neutral strategy that attempts to take advantage of the mean reversion principle to eliminate inefficiencies between two highly correlated assets. In a market that is riddled with inefficiencies, the pairs trading strategy might be a highly effective and lucrative trading stagey for investors in crypto markets.

In [1]:
import pandas as pd
import pandas_datareader as pdr
from statsmodels.tsa.stattools import coint
import requests
import statsmodels.tsa.stattools as ts 
from statsmodels.tsa.vector_ar.vecm import coint_johansen
import datetime as dt
import pandas_datareader as web
import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
%matplotlib inline

Read in Cryptocurreny Data¶

The code below reads in the cryptocurreny price data from Yahoo Finance using the pandas_datareader and datetime packages. Some minor data cleaning was required to prepair the dataset for analysis and visulizations.

In [2]:
start = dt.datetime(2018,1,1)
end = dt.datetime.now()
assets = ['BTC-USD', 'ETH-USD', 'USDT-USD', 'BNB-USD', 'USDC-USD', 'XRP-USD', 'SOL-USD', 'LUNA1-USD', 'DOGE-USD', 'SHIB-USD']


CryptoDF = web.DataReader(assets , 'yahoo', start, end)
CryptoDF = CryptoDF.dropna()
In [3]:
cryptoDF = CryptoDF['Adj Close']
cryptoDF.reset_index(inplace=True)
crypto = px.line(cryptoDF, x = "Date", y = ['BTC-USD', 'ETH-USD', 'USDT-USD', 'BNB-USD', 'USDC-USD', 'XRP-USD', 'SOL-USD', 'LUNA1-USD', 'DOGE-USD', 'SHIB-USD'], title = "Cryptocurrencies Adj Close Time Series")
crypto.update_xaxes(
    rangeslider_visible = True,
    rangeselector = dict(
        buttons = list([
            dict(count = 1, label = "1m", step = "month", stepmode = "backward"),
            dict(count = 6, label = "6m", step = "month", stepmode = "backward"),
            dict(count = 1, label = "YTD", step = "year", stepmode = "todate"),
            dict(count = 1, label = "1y", step = "year", stepmode = "backward"),
            dict(step = "all")
        ])
    )
)
crypto.show()

As seen above it is difficult to determin any relationships between the different cryptocurrencies using the adjusted closing prices. Instead of using prices the log returns will be used to normalize the date such that all variables are in a comparable metric. This will enable the evaluation of analytic relationships of several cryptocurrencies despite them originating from prices series of unequal values.

Assuming that the prices for the above cryptocurrencies are distributed log normally or approximatly log normally, then the log return is likely normally distributed.

$$ r_t = log(1 + R_t) = log \frac {P_t}{P_{t−1}} = log(p_t) − log(p_{t−1}) $$

For further details see: https://quantivity.wordpress.com/2011/02/21/why-log-returns/

In [4]:
df = CryptoDF['Adj Close']
log_returns = np.log(df).diff()
log_returns = log_returns.dropna()
log_returns.reset_index(inplace=True)
log_returns.replace([np.inf, -np.inf], np.nan, inplace=True)
log_returns = log_returns.dropna()
log_returns.head()
Out[4]:
Symbols Date BTC-USD ETH-USD USDT-USD BNB-USD USDC-USD XRP-USD SOL-USD LUNA1-USD DOGE-USD SHIB-USD
1 2021-04-17 -0.014543 -0.036451 0.009122 0.019788 0.009409 0.001622 -0.011783 -0.029507 -0.252695 0.693147
2 2021-04-18 -0.076472 -0.047044 -0.011360 -0.078691 -0.010346 -0.103974 0.252295 -0.135276 0.120221 0.000000
3 2021-04-19 -0.008789 -0.032228 -0.000239 0.048804 -0.000128 -0.066877 -0.019602 0.037210 0.239790 0.693147
4 2021-04-20 0.013348 0.072990 0.000073 0.150332 0.000087 0.050601 0.002968 -0.034691 -0.242837 -0.693147
5 2021-04-21 -0.046520 0.014714 -0.000006 -0.072412 0.000080 -0.064804 0.023117 -0.011301 -0.040154 -0.693147
In [5]:
crypto = px.line(log_returns, x = "Date", y = ['BTC-USD', 'ETH-USD', 'USDT-USD', 'BNB-USD', 'USDC-USD', 'XRP-USD', 'SOL-USD', 'LUNA1-USD', 'DOGE-USD', 'SHIB-USD'], title = "Log Return Time Series")
crypto.update_xaxes(
    rangeslider_visible = True,
    rangeselector = dict(
        buttons = list([
            dict(count = 1, label = "1m", step = "month", stepmode = "backward"),
            dict(count = 6, label = "6m", step = "month", stepmode = "backward"),
            dict(count = 1, label = "YTD", step = "year", stepmode = "todate"),
            dict(count = 1, label = "1y", step = "year", stepmode = "backward"),
            dict(step = "all")
        ])
    )
)
crypto.show()

Correlation VS Cointegration¶

Although correlation and cointegration are similar there are some key differences between the two that will be explored below.

Correlation¶

Definition: Any statistical relationship, whether causal or not, between two random variables or bivariate data. The Correlation Coefficient or Pearson correlation coefficient is commonly obtained by taking the ratio of the covariance of the two variables in question from the numerical dataset, normalized to the square root of their variances. Mathematically, it is the division of the covariance (joint variability of two random variables) of the two variables by the product of their standard deviations.

$$ \rho X,Y = \text{corr(X,Y)} = \frac{\text{cov}(X,Y)}{\sigma_x \sigma_y} = \frac{E[(X-{\mu}x)(Y-{\mu}y)]}{\sigma_x \sigma_y}$$$$ \text{cov}(X, Y) = E[(X - E[X])(Y - E[Y])]$$

Cointegration¶

Definition: A test for non-stationary time series processes that have variances and means that vary over time. This method allows for estimating the long-run parameters or equilibrium in systems with unit root variables. If two or more series are individually integrated (in the time series sense) but some linear combination of them has a lower order of integration, then the series are said to be cointegrated.

In most cases when the individual series $Y_{1,t} \text{and} Y_{2,t}$ are non-stationary first-order integrated variables $I(1)$, a linear combination of these variables is also non-stationary. However, if there exists some (cointegrating) vector of coefficients that form a stationary linear combination of them it is said that the two series are cointegrated.

Consider the two series $Y_{1,t} \text{and} Y_{2,t}$ which are integrated of the first order, $I(1)$. Regressing these variables on one another creates the linear regression model: $u_t = Y_{1,t} - \beta_1Y_{2,t}$ If the error term $u_t$ is stationary, $I(0)$, then by definition the combined $Y_{1,t} - \beta_1Y_{2,t}$ must also be stationary. While both $Y_{1,t} \text{and} Y_{2,t}$ have distinct stochastic trends, the series $Y_{1,t} \text{and} Y_{2,t}$ are cointegrated as the linear combination $Y_{1,t} - \beta_1Y_{2,t}$ has the statistical properties of an $I(0)$ series.

Cointegration Tests¶

Engle-Granger two-step method¶

The Engle-Granger is one of the most common tests used to measure cointegration. It constructs residual errors based on the statistic regress. The test checks the residuals for the presence of unit roots using an Augmented Dickey-Fuller test (ADF test). If the residuals are stationary the two-time series are said to be cointegrated.

  1. The first step of this method is to verify that the individual time series are non-stationary. This is done using the stadard unit root test and ADF test. Both series must be non-stationary first-order integrated variables $I(1)$. Consider the two series $Y_{1,t} \text{and} Y_{2,t}$ which are integrated of the first order, $I(1)$. The following ECM model is estimated. ${\displaystyle A(L)\,\Delta y_{t}=\gamma +B(L)\,\Delta x_{t}+\alpha (y_{t-1}-\beta _{0}-\beta _{1}x_{t-1})+\nu _{t}.}$ If both variables are integrated and this ECM exists, they are cointegrated by the Engle–Granger representation theorem.
  2. The second step of the Engle-Granger test is to estimate teh model using the ordinary least squares (OLS) ${\displaystyle y_{t}=\beta _{0}+\beta _{1}x_{t}+\varepsilon _{t}}$. In the case the regression is not spurious as determined by the test criteriea described the OLS will be valid and demonstrate a consistant estimator (converging in probability to the true value of the parameter). The estimated residuals ${\displaystyle {\hat {\varepsilon _{t}}}=y_{t}-\beta _{0}-\beta _{1}x_{t}}$ are saved and are used in a regression of differenced variables plus the lagged error term. ${\displaystyle A(L)\,\Delta y_{t}=\gamma +B(L)\,\Delta x_{t}+\alpha {\hat {\varepsilon }}_{t-1}+\nu _{t}.}$ From here the test for cointegration is completed using a standard t-statistic on $\alpha$. Where it follows that if ${\alpha}: \leq 0.05$ the two-time series are cointegrated.

This test will reject the null if the p-value < 0.05 in which case there is no need to find an order of differencing. If the p-value is > 0.05 then we would fail to reject the null and the series will require differencing operations to be performed until the time series is stationary.

Issues with the Engle-Granger Test¶

  1. The univariate unit root tests used in the first stage have low statistical power
  2. The choice of the dependent variable in the first stage influences test results, i.e. we need weak exogeneity for $x_t$as determined by Granger causality
  3. One can potentially have a small sample bias
  4. The cointegration test on $\alpha$ does not follow a standard distribution
  5. The validity of the long-run parameters in the first regression stage where one obtains the residuals cannot be verified because the distribution of the OLS estimator of the cointegrating vector is highly complicated and non-normal
  6. At most one cointegrating relationship can be examined.

Johansen Test¶

Many of the weaknesses of the Engle-Granger Test can be addressed by the Johansen Test which is a type of vector error correction model (VECM) that assesses the validity of cointegrating relationships using a maximum likelihood estimate (MLE) approach. The Johansen test seeks to test for cointegration in the multivariate case by determining the rank of $\Pi$ and determining the number of non-zero eigenvalues in $\Pi$. The Johansen Test proposes two methods for estimating the rank, one with the trace statistic and one with the maximum eigenvalue statistic.

VAR¶

Consider a first order Vector Autoregression $VAR(1)$ for the $n\times1 $ vector $\boldsymbol{y}_{t}=[y_{1,t}, y_{2,t}, \ldots ,y_{n,t}]^{\prime}$ $$\begin{eqnarray} \boldsymbol{y}_{t}=\mu+\Pi_{1} \boldsymbol{y}_{t-1}+\boldsymbol{u}_{t} \end{eqnarray}$$ where $\mu=[\mu_{1} , \mu_{2}, \ldots ,\mu_{n}]^{\prime}$ is a vector of constants $\boldsymbol{u}_{t}=[ u_{1,t},u_{2,t}, \ldots , u_{n,t}]^{\prime}$ is a vector of error terms and $\Pi_{1}$ is a $(n \times n)$ matrix of coefficients

The stability of the VAR model is determined by the eigenvalues of $\Pi_{1}$ that are obtained by solving the characteristic equation $\begin{eqnarray} | \; \Pi_{1}- I \; |=0 \end{eqnarray}$ where VAR is stable if all eigenvalues are modulus less than 1.

Consider the example when $n = 2$. The bivariate VAR(1) is: $$\begin{eqnarray*} \left[ \begin{array} [c]{c}% y_{1,t}\\ y_{2,t}% \end{array} \right] =\left[ \begin{array} [c]{c}% \mu_{1}\\ \mu_{2}% \end{array} \right] +\left[ \begin{array} [c]{cc}% \pi_{1,1} & \pi_{1,2}\\ \pi_{2,1} & \pi_{2,2} \end{array} \right] \left[ \begin{array} [c]{c}% y_{1,t-1}\\ y_{2,t-1}% \end{array} \right] +\left[ \begin{array} [c]{c}% \varepsilon_{1,t}\\ \varepsilon_{2,t}% \end{array} \right] \end{eqnarray*} $$ If one or more of the eigenvalues has a modulus greater than or equal to 1 then the VAR is unstable and is nonstationary. Extending this concept to cointegration for multivariate VAR models requires the specification of the vector error correction model.

VECM¶

Rewriting VAR(1) as a VECM by differencing the series yeilds: $$\begin{eqnarray} \nonumber \Delta {\bf{y}}_{t} & = & \mu+(\Pi_{1}-I){\bf{y}}_{t-1}+{\bf{u}}_{t} \\ & = & \mu+\Pi {\bf{y}}_{t-1}+{\bf{u}}_{t} \;\;\; \mathsf{where } \; \Pi=(\Pi_{1}-I) \end{eqnarray} $$

Hypotheses¶

With $n$ variables, the number of linear combinations of the variables in $\boldsymbol{y}_{t}$ that are stationary will provide the resulting number of cointegration vectors.

  1. $\Pi$ has full rank $n = r$ The VAR must be stable as there is no instability present in the system of equations.
  2. $\Pi$ has rank $1\leq r\leq n-1$ The number of linear combinations is smaller than the number of variables. Hence, some of the variables must be unstable and at least one combination of the variables is stable. The number of cointegrating vectors is given by $r$.
  3. $\Pi$ has rank $r=0$ i.e. $\Pi = 0$ There is instability present and no combination of the variables is stable. The unstable VAR is unable to be cointegrated and should be estimated using the first differences.

When $VAR(p)$ are cointegrated, then the VECM includes the long-run cointegration as well as the speed of adjustment parameters. $\Pi$ is then decomposed as $\begin{eqnarray*} \Pi=\alpha\beta^{\prime} \end{eqnarray*}$ where $\alpha$ and $\beta$ are dimensions $n \times r$ $\beta$ is a matrix of cointegraion parameters, such that the linear combinations of $\beta^{\prime} {\bf{y}}_{t}$ are stationary. Each of the $r$ rows in $\beta^{\prime} {\bf{y}}_{t}$ is a cointegrated long-run relation inducing stability. $\alpha$ is a matrix that contains the speed of adjustmet parameters which accounts for the time it takes to move back to equilibrium.

Considering the original bivariate case: $$\begin{eqnarray} \left[\begin{array}{c} \Delta y_{1,t}\\ \Delta y_{2,t} \end{array}\right] = \left[\begin{array}{c} \mu_{1}\\ \mu_{2} \end{array}\right] + \left[\begin{array}{c} \alpha_{1}\\ \alpha_{2} \end{array}\right] \Big[\beta_{1} \beta_{2}\Big] \left[\begin{array}{c} y_{1,t-1}\\ y_{2,t-1} \end{array} \right] + \left[\begin{array}{c} u_{1,t}\\ u_{2,t} \end{array}\right] \end{eqnarray} $$

The cointegration relationship $\beta^{\prime} {\bf{y}}_{t}$ is given by: $$\begin{eqnarray*} \beta^{\prime}{\bf{y}}_{t}=\beta_{1}y_{1,t}+\beta_{2}y_{2,t}\sim I(0) \end{eqnarray*} $$

Generalizing $VAR(p)$: $$\begin{eqnarray*} \Delta {\bf{y}}_{t}=\mu+\alpha\beta {\bf{y}}_{t-1}+\Gamma_{1}\Delta {\bf{y}}_{t-1}+\Gamma_{2}\Delta {\bf{y}}_{t-2}+ \ldots +\Gamma_{p-1}\Delta {\bf{y}}_{t-p-1} + {\bf{u}}_{t} \end{eqnarray*} $$ where $p$ is the number of lags of the vector of variables.

The trace statistic specifies the null of hypothesis of $r$ cointegration relations as: $$\begin{eqnarray} \lambda_{trace}=-T\sum_{i=r+1}^{n}\log(1-\hat{\lambda}_{i}) \;\;\; r=0,1,2, \ldots , n-1 \end{eqnarray}$$

Where the alternative hypothesis is that there are more than $r$ cointegration relationships.

The maximum eigenvalue statistic for the null hypothesis of at most $r$ cointegration relations can be computed as:

$$\begin{eqnarray} \lambda_{max}=-T\log(1-\hat{\lambda}_{r+1}) \;\;\; r=0,1,2,\ldots, n-1 \end{eqnarray}$$

Where the alternative hypothesis is that there are $r + 1$ cointegration relations.

Correlation Without Cointegration¶

An example of two assets that have high levels of correlation but are diverging.

In [6]:
X_returns = np.random.normal(1, 1, 100)
Y_returns = np.random.normal(2, 1, 100)

X_diverging = pd.Series(np.cumsum(X_returns), name='X')
Y_diverging = pd.Series(np.cumsum(Y_returns), name='Y')

pd.concat([X_diverging, Y_diverging], axis=1).plot();

print('Correlation: ' + str(X_diverging.corr(Y_diverging)))
score, pvalue, _ = coint(X_diverging,Y_diverging)
print('Augmented Engle-Granger two-step cointegration test p-value: ' + str(pvalue))
print("\n")

df_diverging = pd.DataFrame({'x':X_diverging, 'y':Y_diverging})
jres = coint_johansen(df_diverging, 1, 0)

print('Cointegration Johansen test Trace Statistic & Critical Values:')
print(jres.trace_stat)
print(jres.trace_stat_crit_vals)
print("\n")
print('Cointegration Johansen test Eigen Statistic & Critical Values:')
print(jres.max_eig_stat)
print(jres.max_eig_stat_crit_vals)
Correlation: 0.991746840058381
Augmented Engle-Granger two-step cointegration test p-value: 0.4954913200436752


Cointegration Johansen test Trace Statistic & Critical Values:
[12.9791327   2.06845427]
[[16.1619 18.3985 23.1485]
 [ 2.7055  3.8415  6.6349]]


Cointegration Johansen test Eigen Statistic & Critical Values:
[10.91067843  2.06845427]
[[15.0006 17.1481 21.7465]
 [ 2.7055  3.8415  6.6349]]

Cointegration Without Correlation¶

An example of two assets that have high levels of cointegration due to mean reversion as shown with a normally distributed series and square wave.

In [7]:
Y2 = pd.Series(np.random.normal(0, 1, 1000), name='Y2') + 20
Y3 = Y2.copy()
# Y2 = Y2 + 10
Y3[0:100] = 30
Y3[100:200] = 10
Y3[200:300] = 30
Y3[300:400] = 10
Y3[400:500] = 30
Y3[500:600] = 10
Y3[600:700] = 30
Y3[700:800] = 10
Y3[800:900] = 30
Y3[900:1000] = 10
Y2.plot()
Y3.plot()
plt.ylim([0, 40]);

# correlation is nearly zero
print('Correlation: ' + str(Y2.corr(Y3)))
score, pvalue, _ = coint(Y2,Y3)
print('Augmented Engle-Granger two-step cointegration test p-value: ' + str(pvalue))


df_coint = pd.DataFrame({'x':Y2, 'y':Y3})
jres = coint_johansen(df_coint, 1, 0)

print('Cointegration Johansen test Trace Statistic & Critical Values:')
print(jres.trace_stat)
print(jres.trace_stat_crit_vals)
print("\n")
print('Cointegration Johansen test Eigen Statistic & Critical Values:')
print(jres.max_eig_stat)
print(jres.max_eig_stat_crit_vals)
Correlation: 0.01863640841561092
Augmented Engle-Granger two-step cointegration test p-value: 0.0
Cointegration Johansen test Trace Statistic & Critical Values:
[688.05341812   9.31962876]
[[16.1619 18.3985 23.1485]
 [ 2.7055  3.8415  6.6349]]


Cointegration Johansen test Eigen Statistic & Critical Values:
[678.73378937   9.31962876]
[[15.0006 17.1481 21.7465]
 [ 2.7055  3.8415  6.6349]]

High Correlation and Cointegration¶

An example of two assets that have high levels of cointegration and correlation with the long run expected value of spread between the two cointegrated time series converging around the mean. These example assets demonstrate a strong candidate for implementing a pairs trading strategy.

In [8]:
X_returns = np.random.normal(0, 1, 100) # Generates simulated daily returns
X = pd.Series(np.cumsum(X_returns), name='X') + 50
some_noise = np.random.normal(0, 1, 100)
Y = X + 5 + some_noise
Y.name = 'Y'
pd.concat([X, Y], axis=1).plot();
print('Correlation: ' + str(X.corr(Y)))
score, pvalue, _ = coint(X,Y)
print('Augmented Engle-Granger two-step cointegration test p-value: ' + str(pvalue))


df_coint = pd.DataFrame({'x':X, 'y':Y})
jres = coint_johansen(df_coint, 1, 0)

print('Cointegration Johansen test Trace Statistic & Critical Values:')
print(jres.trace_stat)
print(jres.trace_stat_crit_vals)
print("\n")
print('Cointegration Johansen test Eigen Statistic & Critical Values:')
print(jres.max_eig_stat)
print(jres.max_eig_stat_crit_vals)
Correlation: 0.9712025708138258
Augmented Engle-Granger two-step cointegration test p-value: 6.633019964138976e-15
Cointegration Johansen test Trace Statistic & Critical Values:
[79.49905236  2.24612685]
[[16.1619 18.3985 23.1485]
 [ 2.7055  3.8415  6.6349]]


Cointegration Johansen test Eigen Statistic & Critical Values:
[77.25292551  2.24612685]
[[15.0006 17.1481 21.7465]
 [ 2.7055  3.8415  6.6349]]
In [9]:
(Y-X).plot() # Plot the spread
plt.axhline((Y-X).mean(), color='black', linestyle='--') # Add the mean
plt.axhline(5.5, color='red', linestyle='--') 
plt.axhline(4.5, color='green', linestyle='--')
Out[9]:
<matplotlib.lines.Line2D at 0x7f8e092cb6d0>

Implementing the Pairs Trading Strategy¶

Hedging¶

Definition: An invesment position intended to offset and balance the potential risk of an investment by assuming a position in contary or opposing market or investment.

Short¶

Definition: Short selling is an investment or trading strategy that speculates on the decline in a stock or other security's price. Investors or portfolio managers may use short selling as a hedge against the risk of a long position in the same or a related security. The position is opened by borrowing shares of an asset that the investor belives will decrease in value. The investor sells the borrowed shares to buyers willing to pay the current market price and hopes that they will be able to repurchase the shares at a lower price before returning the borrowed shares.

Long¶

Definition: A long position is an investment or trading strategy that speculates on the increase in a stock or other security's price. Investors or portfolio managers may use a long position as a hedge against the risk of a short position in the same or related security. An example of opening a long position is through the purchase of a long call options contract which gives the holder the option to buy the underlying asset at a certain price.

Pairs Trading¶

Securities with high conintegration and covariance tend to have an approximatly stationary distance or spread, although there will be periods when the distance between the two securities is high and periods when the distance is low. Pairs trading comes from maintaining a hedged position across X and Y. If both securities go down money is neither made nor lost, and likewise if both go up. The strategy for making a profitable investment relys on the spread of the two securities reverting to the mean. When the spread between securities X and Y are far apart the following positions will be opened: short Y and long X. Similarly when the spread between securities X and Y are close together the following positions will be opened: long Y and short X. Once the divergent spred reverts to the long run mean the positions are closed.

In [10]:
plt.figure(figsize = (15,8))
sns.heatmap(cryptoDF.corr(), cmap = "bone", linewidths = .7, linecolor = "black", square = True)
cryptoDF.corr()
Out[10]:
Symbols BTC-USD ETH-USD USDT-USD BNB-USD USDC-USD XRP-USD SOL-USD LUNA1-USD DOGE-USD SHIB-USD
Symbols
BTC-USD 1.000000 0.835622 -0.225157 0.837048 -0.161216 0.755747 0.622484 0.463182 0.606414 0.534158
ETH-USD 0.835622 1.000000 -0.235223 0.948488 -0.238271 0.812596 0.876003 0.694296 0.751839 0.795131
USDT-USD -0.225157 -0.235223 1.000000 -0.215600 0.659673 -0.180237 -0.172301 -0.169052 -0.181073 -0.145416
BNB-USD 0.837048 0.948488 -0.215600 1.000000 -0.163453 0.878858 0.779671 0.650092 0.795633 0.707161
USDC-USD -0.161216 -0.238271 0.659673 -0.163453 1.000000 -0.109548 -0.210163 -0.237028 -0.117340 -0.216401
XRP-USD 0.755747 0.812596 -0.180237 0.878858 -0.109548 1.000000 0.598313 0.395136 0.855008 0.479411
SOL-USD 0.622484 0.876003 -0.172301 0.779671 -0.210163 0.598313 1.000000 0.739120 0.473815 0.878414
LUNA1-USD 0.463182 0.694296 -0.169052 0.650092 -0.237028 0.395136 0.739120 1.000000 0.251514 0.734762
DOGE-USD 0.606414 0.751839 -0.181073 0.795633 -0.117340 0.855008 0.473815 0.251514 1.000000 0.409844
SHIB-USD 0.534158 0.795131 -0.145416 0.707161 -0.216401 0.479411 0.878414 0.734762 0.409844 1.000000
In [11]:
plt.figure(figsize = (15,8))
sns.heatmap(log_returns.corr(), cmap = "bone", linewidths = .7, linecolor = "black", square = True)
log_returns.corr()
Out[11]:
Symbols BTC-USD ETH-USD USDT-USD BNB-USD USDC-USD XRP-USD SOL-USD LUNA1-USD DOGE-USD SHIB-USD
Symbols
BTC-USD 1.000000 0.822166 0.017480 0.775882 0.003274 0.753971 0.542804 0.533103 0.610963 0.335197
ETH-USD 0.822166 1.000000 -0.010023 0.854602 -0.029232 0.754606 0.676725 0.603442 0.602307 0.335394
USDT-USD 0.017480 -0.010023 1.000000 0.037701 0.866222 0.030281 -0.125480 0.021541 -0.173571 0.123478
BNB-USD 0.775882 0.854602 0.037701 1.000000 0.048421 0.778611 0.629238 0.630778 0.599484 0.294332
USDC-USD 0.003274 -0.029232 0.866222 0.048421 1.000000 0.020428 -0.116809 0.023725 -0.165626 0.112289
XRP-USD 0.753971 0.754606 0.030281 0.778611 0.020428 1.000000 0.545541 0.554199 0.609099 0.228781
SOL-USD 0.542804 0.676725 -0.125480 0.629238 -0.116809 0.545541 1.000000 0.617261 0.491747 0.248133
LUNA1-USD 0.533103 0.603442 0.021541 0.630778 0.023725 0.554199 0.617261 1.000000 0.433719 0.221597
DOGE-USD 0.610963 0.602307 -0.173571 0.599484 -0.165626 0.609099 0.491747 0.433719 1.000000 0.267561
SHIB-USD 0.335197 0.335394 0.123478 0.294332 0.112289 0.228781 0.248133 0.221597 0.267561 1.000000
In [12]:
df = log_returns.iloc[: , 1:]
for a1 in df.columns:
    for a2 in df.columns:
        if a1 != a2:
            test_result = ts.coint(df[a1], df[a2])
            print(a1 + ' and ' + a2 + ': p-value = ' + str(test_result[1]))
BTC-USD and ETH-USD: p-value = 0.0
BTC-USD and USDT-USD: p-value = 0.0
BTC-USD and BNB-USD: p-value = 1.4070918112157003e-29
BTC-USD and USDC-USD: p-value = 0.0
BTC-USD and XRP-USD: p-value = 1.3973128870854195e-29
BTC-USD and SOL-USD: p-value = 1.3547440364454734e-29
BTC-USD and LUNA1-USD: p-value = 0.0
BTC-USD and DOGE-USD: p-value = 0.0
BTC-USD and SHIB-USD: p-value = 2.751879335694028e-06
ETH-USD and BTC-USD: p-value = 1.0849297995694754e-21
ETH-USD and USDT-USD: p-value = 1.690245679989267e-09
ETH-USD and BNB-USD: p-value = 1.7282083746756782e-29
ETH-USD and USDC-USD: p-value = 1.8119219853355185e-09
ETH-USD and XRP-USD: p-value = 8.310483343291036e-14
ETH-USD and SOL-USD: p-value = 3.383434553173779e-29
ETH-USD and LUNA1-USD: p-value = 0.0
ETH-USD and DOGE-USD: p-value = 4.727428574683182e-11
ETH-USD and SHIB-USD: p-value = 0.0
USDT-USD and BTC-USD: p-value = 5.824370153315442e-13
USDT-USD and ETH-USD: p-value = 3.675457792066264e-13
USDT-USD and BNB-USD: p-value = 7.422987816616695e-13
USDT-USD and USDC-USD: p-value = 1.2001264952695478e-14
USDT-USD and XRP-USD: p-value = 5.181598771950224e-13
USDT-USD and SOL-USD: p-value = 1.1449364619387984e-25
USDT-USD and LUNA1-USD: p-value = 5.326743541382849e-13
USDT-USD and DOGE-USD: p-value = 4.791135972745676e-21
USDT-USD and SHIB-USD: p-value = 8.27100616214492e-12
BNB-USD and BTC-USD: p-value = 6.598733543467639e-17
BNB-USD and ETH-USD: p-value = 5.278597073535519e-21
BNB-USD and USDT-USD: p-value = 1.2264557757575208e-06
BNB-USD and USDC-USD: p-value = 1.1815356857352212e-06
BNB-USD and XRP-USD: p-value = 4.212161014176728e-14
BNB-USD and SOL-USD: p-value = 7.988940565328788e-11
BNB-USD and LUNA1-USD: p-value = 3.104321657531003e-12
BNB-USD and DOGE-USD: p-value = 2.6476069393482235e-11
BNB-USD and SHIB-USD: p-value = 2.464748483299084e-06
USDC-USD and BTC-USD: p-value = 3.451982379758259e-16
USDC-USD and ETH-USD: p-value = 7.914248531274647e-17
USDC-USD and USDT-USD: p-value = 2.77108191706405e-15
USDC-USD and BNB-USD: p-value = 7.054056081190404e-15
USDC-USD and XRP-USD: p-value = 1.4635905960109235e-15
USDC-USD and SOL-USD: p-value = 4.3084593188635585e-14
USDC-USD and LUNA1-USD: p-value = 1.471941942781307e-15
USDC-USD and DOGE-USD: p-value = 2.141256091654443e-22
USDC-USD and SHIB-USD: p-value = 7.15977401540107e-11
XRP-USD and BTC-USD: p-value = 0.0
XRP-USD and ETH-USD: p-value = 1.0731917492321476e-20
XRP-USD and USDT-USD: p-value = 0.0
XRP-USD and BNB-USD: p-value = 6.374025456999343e-23
XRP-USD and USDC-USD: p-value = 0.0
XRP-USD and SOL-USD: p-value = 0.0
XRP-USD and LUNA1-USD: p-value = 0.0
XRP-USD and DOGE-USD: p-value = 1.8937618187450178e-29
XRP-USD and SHIB-USD: p-value = 8.74581054876333e-09
SOL-USD and BTC-USD: p-value = 6.0401427166477714e-24
SOL-USD and ETH-USD: p-value = 2.4969785175899078e-29
SOL-USD and USDT-USD: p-value = 3.9547623352314125e-12
SOL-USD and BNB-USD: p-value = 1.4641802278756945e-29
SOL-USD and USDC-USD: p-value = 5.1164113220893505e-12
SOL-USD and XRP-USD: p-value = 0.0
SOL-USD and LUNA1-USD: p-value = 0.0
SOL-USD and DOGE-USD: p-value = 0.0
SOL-USD and SHIB-USD: p-value = 0.0
LUNA1-USD and BTC-USD: p-value = 0.0
LUNA1-USD and ETH-USD: p-value = 0.0
LUNA1-USD and USDT-USD: p-value = 0.0
LUNA1-USD and BNB-USD: p-value = 1.3402775076848768e-29
LUNA1-USD and USDC-USD: p-value = 0.0
LUNA1-USD and XRP-USD: p-value = 0.0
LUNA1-USD and SOL-USD: p-value = 0.0
LUNA1-USD and DOGE-USD: p-value = 0.0
LUNA1-USD and SHIB-USD: p-value = 0.0
DOGE-USD and BTC-USD: p-value = 9.538829689489948e-16
DOGE-USD and ETH-USD: p-value = 6.476003396750355e-12
DOGE-USD and USDT-USD: p-value = 1.205398333327633e-07
DOGE-USD and BNB-USD: p-value = 1.294231640460153e-09
DOGE-USD and USDC-USD: p-value = 1.4743175368269807e-07
DOGE-USD and XRP-USD: p-value = 1.1785703214206945e-08
DOGE-USD and SOL-USD: p-value = 1.8948027775871622e-09
DOGE-USD and LUNA1-USD: p-value = 5.268640634896872e-08
DOGE-USD and SHIB-USD: p-value = 5.529561203722615e-09
SHIB-USD and BTC-USD: p-value = 1.4429788873934247e-06
SHIB-USD and ETH-USD: p-value = 7.43253753832311e-08
SHIB-USD and USDT-USD: p-value = 5.073586538247463e-07
SHIB-USD and BNB-USD: p-value = 1.9418165864334218e-07
SHIB-USD and USDC-USD: p-value = 5.514174519054891e-07
SHIB-USD and XRP-USD: p-value = 9.436987614645229e-07
SHIB-USD and SOL-USD: p-value = 2.7594733646046787e-07
SHIB-USD and LUNA1-USD: p-value = 1.8798113744637878e-06
SHIB-USD and DOGE-USD: p-value = 8.452949247874129e-07
In [13]:
A1 = df["BTC-USD"]
A2 = df["ETH-USD"]
A1 = sm.add_constant(A1)
results = sm.OLS(A2, A1).fit()
A1 = A1["BTC-USD"]
b = results.params["BTC-USD"]

spread = A2 - b * A1
spread.plot()
plt.axhline(spread.mean(), color='black')
plt.legend(['Spread']);
/Users/eshankaul/opt/anaconda3/lib/python3.8/site-packages/statsmodels/tsa/tsatools.py:142: FutureWarning:

In a future version of pandas all arguments of concat except for the argument 'objs' will be keyword-only

In [14]:
ratio = A1/A2
ratio.plot()
plt.axhline(ratio.mean(), color='black')
plt.legend(['Price Ratio']);
In [15]:
def zscore(series):
    return (series - series.mean()) / np.std(series)
zscore(spread).plot()
plt.axhline(zscore(spread).mean(), color='black')
plt.axhline(1.0, color='red', linestyle='--')
plt.axhline(-1.0, color='green', linestyle='--')
plt.legend(['Spread z-score', 'Mean', '+1', '-1']);

Trading Strategy Revised¶

  1. Short the spread when the z-score is above 1 (i.e. buy one share of A2 short sell one share of A1)
  2. Long the spread when the z-score is below -1 (i.e. buy one share of A1 short sell one share of A2)